Script to summarise and visualise information for user-defined genes from multiple Mutation Annotation Format (MAF) files using maftools R package.
Make sure to set the java max heap size to 2Gb to accomodate big gene tables written into excel spreadsheet using the xlsx R package
##### Set the jave max heap size to 2Gb to accomodate big gene tables
options( java.parameters = "-Xmx2000m" )
suppressMessages(library(knitr))
suppressMessages(library(maftools))
suppressMessages(library(xlsx))
suppressMessages(library(optparse))
suppressMessages(library(DT))
Go to the MAF files directory, read the files and create directory for output files, if does not exist already
##### Read MAF files and put associated info into a list
mafFiles <- params$mafFiles
mafInfo <- vector("list", length(mafFiles))
for ( i in 1:length(mafFiles) ) {
mafInfo[[i]] = read.maf(maf = mafFiles[i], verbose = FALSE)
##### Subsets MAF based on queried genes
mafInfo[[i]] <- subsetMaf(maf=mafInfo[[i]], genes=params$genes, mafObj=TRUE)
}
## [1] NA
## [1] NA
## [1] NA
## [1] NA
##### Create directory for output files
outDir <- paste(params$mafDir, params$outDir, sep = "/")
if ( !file.exists(params$outDir) ){
dir.create(outDir)
}
Create tables for individual cohorts with information for queried genes (rows), including information about no. of different types of mutations (columns), including frameshift deletions, frameshift insertions, in-frame deletions, in-frame insertions, missense mutations, nonsense mutations, nonstop mutations, splice site mutations, translation start site mutations, as well as the total no. of mutations present in the MAF file. The last two columns contain the no. of samples with mutations/alterations in the corresponding gene.
##### Write gene summary into a file
cohorts.list <- params$cohorts
if ( !file.exists(paste(outDir, "MAF_gene_summary.xlsx", sep = "/")) ){
for ( i in 1:length(mafFiles) ) {
write.xlsx(maftools::getGeneSummary(mafInfo[[i]]), file=paste(outDir, "MAF_gene_summary.xlsx", sep="/"), sheetName=cohorts.list[i], row.names=FALSE, append=TRUE)
}
}
##### Present a gene table in the html report
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
for ( i in 1:length(mafFiles) ) {
widges.list[[i]] <- DT::datatable(data = maftools::getGeneSummary(mafInfo[[i]]), caption = paste0("Genes summary: ", cohorts.list[i]), filter = "top")
}
##### Print a list of htmlwidgets
widges.list
This time generate an interactive heatmaps to summmarise queried genes information. Rows and columns represent genes and mutation types, respectively. The colour scale from blue to yellow indicates low and high number of various mutations types, respectively, observed in corresponding genes. Genes are ordered by the number of reported mutations. The total number of mutations in individual genes, as well as the number of samples with mutations are also presented in the last three columns.
suppressMessages(library(plotly))
suppressMessages(library(heatmaply))
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
##### Display genes summary in a form of interactive heatmap
for ( i in 1:length(mafFiles) ) {
geneSummary <- data.frame(maftools::getGeneSummary(mafInfo[[i]]))
rownames(geneSummary) <-geneSummary[,"Hugo_Symbol"]
geneSummary <- subset(geneSummary, select=-c(Hugo_Symbol))
##### Cluster table by genes
#hr <- hclust(as.dist(dist(geneSummary, method="euclidean")), method="ward.D")
##### Generate interactive heatmap
#p <- heatmaply(geneSummary, main = paste0("Genes summary: ", cohorts.list[i]), Rowv=as.dendrogram(hr), Colv=NULL, scale="none", dendrogram="none", trace="none", hide_colorbar = TRUE, fontsize_row = 8, fontsize_col = 8) %>%
#layout(autosize = TRUE, width = 800, height = 800, margin = list(l=250, r=10, b=150, t=50, pad=4), showlegend = TRUE)
##### Generate interactive heatmap
p <- heatmaply(geneSummary, main = paste0("Genes summary: ", cohorts.list[i]), Rowv=NULL, Colv=NULL, scale="none", dendrogram="none", trace="none", hide_colorbar = FALSE, fontsize_row = 8, label_names=c("Gene","Mutation_type","Count")) %>%
layout(width = 900, height = 600, margin = list(l=150, r=10, b=150, t=50, pad=4), titlefont = list(size=16), xaxis = list(tickfont=list(size=10)), yaxis = list(tickfont=list(size=10)))
##### Add plot to the list for htmlwidgets
widges.list[[i]] <- as_widget(ggplotly(p))
##### Save the heatmap as html (PLOTLY)
htmlwidgets::saveWidget(as_widget(p), paste0(outDir, "/MAF_gene_summary_heatmap_", cohorts.list[i], ".html"), selfcontained = TRUE)
##### Plotly option
#p <- plot_ly(x = colnames(geneSummary), y = rownames(geneSummary), z = as.matrix(geneSummary), height = 600, type = "heatmap") %>%
#layout(title = paste0("Genes summary: ", cohorts.list[i]), autosize = TRUE, margin = list(l=150, r=10, b=100, t=100, pad=4), showlegend = TRUE)
#widges.list[[i]] <- ggplotly(p)
}
##### Detach plotly package. Otherwise it clashes with other graphics devices
detach("package:heatmaply", unload=FALSE)
detach("package:plotly", unload=FALSE)
##### Print a list of htmlwidgets
widges.list
Oncoplot illustrating different types of mutations observed across samples for the selected genes. The side and top bar-plots present the frequency of mutations in each gene and in each sample, respectively.
###### Generate separate plot for each cohort
for ( i in 1:length(mafFiles) ) {
cat(paste(cohorts.list[i], "cohort\n\n", sep=" "))
##### Drawing oncoplots for selected genes in each cohort
plot.new()
par(mar=c(4,4,2,0.5), oma=c(1.5,2,2,1))
maftools::oncoplot(maf = mafInfo[[i]], fontSize = 12, keepGeneOrder = TRUE, GeneOrderSort = TRUE)
}
## TCGA-PAAD cohort
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
## ICGC-PACA-AU cohort
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
## ICGC-PACA-AU-additional cohort
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## ICGC-PACA-CA cohort
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
Somatic interactions plots facilitate detection of mutually exclusive or co-occurring set of genes. The somatic interactions function performs pair-wise Fisher’s Exact test to detect significant pair of genes and uses cometExactTest to identify potentially altered gene sets involving > 2 two genes (see CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer for more details). The colour key, indicating the Fisher’s Exact test p-value, is presented on the right-hand side. This part of the script also generates a table with Fisher’s Exact test and cometExactTest results.
###### Generate separate plot for each cohort
##### Create a list for somatic interactions info for each cohort
somInteractions <- vector("list", length(mafFiles))
for ( i in 1:length(mafFiles) ) {
cat(paste(cohorts.list[i], "cohort\n\n", sep=" "))
##### Drawing somatic interactions for selected genes in each cohort
somInteractions[[i]] <- maftools::somaticInteractions(maf = mafInfo[[i]], pvalue = c(0.05, 0.01), returnAll = TRUE, verbose = FALSE)
}
## TCGA-PAAD cohort
## ICGC-PACA-AU cohort
## ICGC-PACA-AU-additional cohort
## ICGC-PACA-CA cohort
Somatic interactions tables facilitate detection of mutually exclusive or co-occurring set of genes. The first set of tables, called gene-pairs, present results from pair-wise Fisher’s Exact test aiming to detect significant pair of mutually exclusive or co-occurring genes. The second set of tables, called gene sets, resent results from cometExactTest facilitating identification of potentially altered gene sets involving > 2 two genes (see CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer for more details).
###### Generate separate table for each cohort
##### Create a list for htmlwidgets for the somatic interactions table
widges.list <- htmltools::tagList()
for ( i in 1:length(mafFiles) ) {
if ( nrow(somInteractions[[i]]$pairs) != 0 ) {
##### Presenting a somatic interactions table (gene pairs) in the html report
widges.list[[i]] <- DT::datatable(data = as.data.frame(somInteractions[[i]]$pairs), caption = paste0("Gene pairs: ", cohorts.list[i]), filter = "top")
} else {
##### Warn if no interactions between selected genes were detected
cat(paste("No gene pairs were detected in ", cohorts.list[i], "cohort\n\n", sep=" "))
}
}
##### Print a list of htmlwidgets (gene pairs tables)
widges.list
widges.list <- htmltools::tagList()
for ( i in 1:length(mafFiles) ) {
if ( length(somInteractions[[i]]$gene_sets) != 0 ) {
##### Presenting a somatic interactions table (gene sets) in the html report
widges.list[[i]] <- DT::datatable(data = as.data.frame(somInteractions[[i]]$gene_sets), caption = paste0("Gene sets: ", cohorts.list[i]), filter = "top")
} else {
##### Warn if no interactions between selected genes were detected
cat(paste("No gene sets were detected in ", cohorts.list[i], "cohort\n\n", sep=" "))
}
}
##### Print a list of htmlwidgets (gene sets tables)
widges.list
##### Write somatic interactions table into a file
if ( !file.exists(paste(outDir, "MAF_somatic_interactions.xlsx", sep = "/")) ){
for ( i in 1:length(mafFiles) ) {
##### Ignore if no interactions between selected genes were detected
if ( nrow(somInteractions[[i]]$pairs) != 0 ) {
##### Gene pairs info
write.xlsx(as.data.frame(somInteractions[[i]]$pairs), file=paste(outDir, "MAF_somatic_interactions.xlsx", sep="/"), sheetName=paste0(cohorts.list[i], " (gene pairs)"), row.names=FALSE, append=TRUE)
}
if ( length(somInteractions[[i]]$gene_sets) != 0 ) {
##### Gene sets info
write.xlsx(as.data.frame(somInteractions[[i]]$gene_sets), file=paste(outDir, "MAF_somatic_interactions.xlsx", sep="/"), sheetName=paste0(cohorts.list[i], " (gene sets)"), row.names=FALSE, append=TRUE)
}
}
}
###### Generate separate file with plots for each cohort
for ( i in 1:length(mafFiles) ) {
pdf( file = paste(outDir, "/MAF_summary_genes_", cohorts.list[i], ".pdf", sep = "") )
##### Drawing oncoplots for the top 10 genes in each cohort
plot.new()
par(mar=c(4,4,2,0.5), oma=c(1.5,2,2,1))
oncoplot(maf = mafInfo[[i]], fontSize = 12, keepGeneOrder = TRUE, GeneOrderSort = TRUE)
##### Drawing somatic interactions for selected genes in each cohort
maftools::somaticInteractions(maf = mafInfo[[i]], pvalue = c(0.05, 0.01), returnAll = TRUE, verbose = FALSE)
}
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA
dev.off()
## quartz_off_screen
## 2
for ( i in 1:length(params) ) {
cat(paste("Parameter: ", names(params)[i], "\nValue: ", paste(unlist(params[i]), collapse = ","), "\n\n", sep=""))
}
## Parameter: mafDir
## Value: /Users/jmarzec/data/mutation
##
## Parameter: mafFiles
## Value: /Users/jmarzec/data/mutation/PAAD.tcga.uuid.curated.somatic.clean.maf,/Users/jmarzec/data/mutation/PACA-AU.icgc.simple_somatic_mutation.maf,/Users/jmarzec/data/mutation/DCC17_PDAC_Not_in_DCC.maf,/Users/jmarzec/data/mutation/PACA-CA.icgc.simple_somatic_mutation.maf
##
## Parameter: cohorts
## Value: TCGA-PAAD,ICGC-PACA-AU,ICGC-PACA-AU-additional,ICGC-PACA-CA
##
## Parameter: genes
## Value: KRAS,SMAD4,TP53,CDKN2A,ARID1A,BRCA1,BRCA2
##
## Parameter: outDir
## Value: MAF_summary_genes
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] viridis_0.5.1 viridisLite_0.3.0 ggplot2_2.2.1.9000
## [4] DT_0.4 knitr_1.20 optparse_1.4.4
## [7] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-9
## [10] maftools_1.6.07 Biobase_2.40.0 BiocGenerics_0.26.0
##
## loaded via a namespace (and not attached):
## [1] changepoint_2.2.2 backports_1.1.2
## [3] circlize_0.4.3 NMF_0.21.0
## [5] plyr_1.8.4 lazyeval_0.2.1
## [7] splines_3.5.0 BiocParallel_1.14.1
## [9] crosstalk_1.0.0 GenomeInfoDb_1.16.0
## [11] gridBase_0.4-7 digest_0.6.15
## [13] foreach_1.4.4 htmltools_0.3.6
## [15] gdata_2.18.0 magrittr_1.5
## [17] memoise_1.1.0 BSgenome_1.48.0
## [19] cluster_2.0.7-1 doParallel_1.0.11
## [21] gclus_1.3.1 ComplexHeatmap_1.18.0
## [23] Biostrings_2.48.0 wordcloud_2.5
## [25] matrixStats_0.53.1 prettyunits_1.0.2
## [27] colorspace_1.3-2 blob_1.1.1
## [29] ggrepel_0.8.0 dplyr_0.7.5
## [31] RCurl_1.95-4.10 jsonlite_1.5
## [33] bindr_0.1.1 survival_2.42-3
## [35] VariantAnnotation_1.26.0 zoo_1.8-1
## [37] iterators_1.0.9 glue_1.2.0
## [39] registry_0.5 gtable_0.2.0
## [41] zlibbioc_1.26.0 XVector_0.20.0
## [43] webshot_0.5.0 GetoptLong_0.1.6
## [45] DelayedArray_0.6.0 kernlab_0.9-26
## [47] shape_1.4.4 DEoptimR_1.0-8
## [49] prabclus_2.2-6 scales_0.5.0.9000
## [51] mvtnorm_1.0-7 DBI_1.0.0
## [53] rngtools_1.3.1 Rcpp_0.12.17
## [55] xtable_1.8-2 progress_1.1.2
## [57] bit_1.1-13 mclust_5.4
## [59] stats4_3.5.0 htmlwidgets_1.2
## [61] httr_1.3.1 getopt_1.20.2
## [63] gplots_3.0.1 RColorBrewer_1.1-2
## [65] fpc_2.1-11 modeltools_0.2-21
## [67] pkgconfig_2.0.1 XML_3.98-1.11
## [69] flexmix_2.3-14 nnet_7.3-12
## [71] labeling_0.3 tidyselect_0.2.4
## [73] rlang_0.2.0.9001 reshape2_1.4.3
## [75] later_0.7.2 AnnotationDbi_1.42.1
## [77] munsell_0.4.3 tools_3.5.0
## [79] RSQLite_2.1.1 evaluate_0.10.1
## [81] stringr_1.3.1 heatmaply_0.14.1
## [83] yaml_2.1.19 bit64_0.9-7
## [85] robustbase_0.93-0 caTools_1.17.1
## [87] purrr_0.2.4 dendextend_1.8.0
## [89] bindrcpp_0.2.2 whisker_0.3-2
## [91] mime_0.5 slam_0.1-43
## [93] biomaRt_2.36.0 compiler_3.5.0
## [95] plotly_4.7.1.9000 tibble_1.4.2
## [97] stringi_1.2.2 GenomicFeatures_1.32.0
## [99] trimcluster_0.1-2 lattice_0.20-35
## [101] Matrix_1.2-14 pillar_1.2.2
## [103] GlobalOptions_0.0.13 data.table_1.11.2
## [105] cowplot_0.9.2 bitops_1.0-6
## [107] seriation_1.2-3 httpuv_1.4.3
## [109] rtracklayer_1.40.2 GenomicRanges_1.32.3
## [111] R6_2.2.2 TSP_1.1-6
## [113] promises_1.0.1 cometExactTest_0.1.5
## [115] KernSmooth_2.23-15 gridExtra_2.3
## [117] IRanges_2.14.10 codetools_0.2-15
## [119] gtools_3.5.0 MASS_7.3-50
## [121] assertthat_0.2.0 SummarizedExperiment_1.10.1
## [123] pkgmaker_0.22 rprojroot_1.3-2
## [125] rjson_0.2.19 withr_2.1.2
## [127] GenomicAlignments_1.16.0 Rsamtools_1.32.0
## [129] S4Vectors_0.18.2 GenomeInfoDbData_1.1.0
## [131] diptest_0.75-7 grid_3.5.0
## [133] tidyr_0.8.1 class_7.3-14
## [135] rmarkdown_1.9 shiny_1.1.0